Developer CD Series 1997 January: Mac OS SDK

home *** CD-ROM | disk | FTP | other *** search

/ Developer CD Series 1997 January: Mac OS SDK / Dev.CD Jan 97 SDK2.toast / Development Kits (Disc 2) / OpenDoc / Developer Documentation / Recipes, Tech Notes & Articles / Tech Notes / Utilities Documentation / OpenHash < prev next >

Wrap

Text File | 1995-11-07 | 21.8 KB | 255 lines | [TEXT/ttxt]

OpenDoc™ Utilities Documentation OpenHash: hash table utility by David McCusker © 1995 Apple Computer, Inc. All Rights Reserved. Apple, the Apple logo, and Macintosh are registered trademarks of Apple Computer, Inc. Mac and OpenDoc are trademarks of Apple Computer, Inc. OpenHashTable is a hash table that implements "open" hashing (such that collisions are handled by chaining together the members in a bucket). This permits efficient removal of hash table associations (as opposed to closed hashing, which might require a non-trivial amount of entry movement after a removal). In particular, this implementation of open hashing supports removal during iterations. A table is logically a set of <key, value> pairs, where no key is equal to another. This defines a mapping that associates keys with values. OpenHashTable manages the space for both keys and values; keys and values are copied into and out of the hash table through the interface. The amount of space required for each key and value is specified when the hash table is initialized. Values may be specified as zero-sized, causing the the table to allocate space for keys only. Users might want to do this when a key contains all the information of interest, and a value would be redundant. (Note that keys can not be zero-sized.) If users want to manage the memory for their own keys and values, it is simple to use pointers for the keys and values, and specify the size as sizeof(void*). The only requirements for keys are that they have appropriate OpenHashKeyEqual and OpenHashKeyHash functions, which are related to each other by the following invariant: (*equal)(key1, key2) implies (*hash)(key1) == (*hash)(key2) Assuming a full table, the storage overhead for this implementation is two longwords per <key, value> association. (If OPENHASH_CACHE_HASH is defined, another longword of overhead is added to cache the hash of keys, and the speed of lookup operations and changes in table size will be significantly increased.) This implementation uses mod (%) to calculate buckets from hashes. If masking (&) had been used, the table would have been constrained to sizes that are powers of two. Masking might be slightly faster, but causes poor distribution if the low bits of hashes are not very pseudo-random. The interator, OpenHashTableIterator, which allows deletion while iterating over the associations in an OpenHashTable, is described after all the OpenHashTable methods. Types These types are used by the OpenHashTable interface. typedef ODBoolean (*OpenHashKeyEqual)(const void* key, const void* testKey); typedef ODULong (*OpenHashKeyHash)(const void* key); typedef unsigned int OpenHashSize; // must be unsigned, (similar to size_t) typedef ODBoolean (*OpenHashAction)(void* key, void* value, ODULong slot, void* ref); OpenHashAction supports walking hash tables. Returning true causes a walk to terminate early (false continues the walk). This allows a walk to be invoked just to see if it finds something interesting. Hashing and Equality This sections describes static functions which clients might choose to pass as instances of either OpenHashKeyHash or OpenHashKeyEqual (or use in the implementation of such functions) in order to handle the hash and equal operations required by the hash table. static ODULong OpenHashTable::StdHash(const void* key); A standard hash function useable when keys are pointers or ints. Hash is defined as the pointer cast to an integer. Note that since a pointer to the key is passed, the argument type is actually pointer to a pointer. Returns *(ODULong*) key. static ODULong OpenHashTable::Random(ODULong seed); Pseudo-random number generator with period 2**(31-1) (i.e. all values in 1..2**(31-1) are visted before repetition). Useful for hashing one or more longwords. Derived from published literature on good random number generators (the names Park and Miller come to mind). static ODULong OpenHashTable::HashString(const ODUByte* key, ODULong keyLength); This is a generic hashing utility that might sometimes be used to implement a hash function passed to the OpenHashTable constructor; the keyLength arg must be curried because only key is passed to a hash function. HashString() is conceptually derived from P.J.Weinberger (from the "dragon" book). static ODULong OpenHashTable::RandomOneLong(const void* key); A standard hash function for a 4-byte key, based on Random(). Returns Random( *(ODULong*) key ); static ODULong OpenHashTable::RandomTwoLongs(const void* key); A standard hash function for an 8-byte key, based on Random(). Returns Random( ((ODULong*) key)[0] ^ ((ODULong*) key)[1] ); static ODULong OpenHashTable::HashTwoLongs(const void* key); A standard hash function for an 8-byte key, where the longwords in the keys are likely already largish values (like pointers). Returns ( ((ODULong*) key)[0] ^ ((ODULong*) key)[1] ); static ODBoolean OpenHashTable::StdEqual(const void* key, const void* testKey); A standard equal function useable when keys are pointers or ints. Equality is defined as pointer equality. Note that since a pointer to the key is passed, the argument type is actually pointer to a pointer. Returns ( *(void**) key == *(void**) testKey ); static ODBoolean OpenHashTable::EqualTwoLongs(const void* key, const void* testKey); A standard equal function for an 8-byte key. Compares the first two longwords of each keys as unsigned longs. Making and Breaking This section addresses construction, destruction, intialization, and copying. OpenHashTable::OpenHashTable(OpenHashKeyEqual equal = StdEqual, OpenHashKeyHash hash = StdHash, ODMemoryHeapID heap=kDefaultHeapID); Compare keys with equal, and hash them with hash. All memory is allocated from the specified heap. The equal and hash functions must fulfill the following invariant: (*equal)(key1, key2) implies (*hash)(key1) == (*hash)(key2) OpenHashTable::OpenHashTable(const OpenHashTable& otherTable); Use the same equal, hash, and heap as otherTable. (Does not copy otherTable; use InitAndCopyFrom() to actually copy the contents of otherTable.) void OpenHashTable::SetHeap(ODMemoryHeapID heap=kDefaultHeapID); All memory is allocated from the specified heap. This method does nothing after Initialize() has been called. Generally this function is only called by hash tables delegating to this one, which specify the heap in Initialize() functions instead of in the constructor. ODBoolean OpenHashTable::Initialize(ODULong sizeHint=5, OpenHashSize keySize=sizeof(void*), OpenHashSize valueSize=sizeof(void*), ODBoolean throwErrors=kODTrue); Initialize the table with sizeHint slots. Keys will occupy keySize bytes of space (here-after referred to as key-size), and this amount of space will copied in/out of functions. Values will occupy valueSize bytes of space (here-after referred to as value-size), and this amount of space will copied in/out of functions. valueSize can be zero, causing values to be ignored. The value of throwErrors controls whether exceptions will be thrown when errors occur. Note that if either keys or values have alignment requirements, then keySize and valueSize must be such that keyPtr + keySize and valuePtr + valueSize point to appropriately aligned structures. Returns false only if memory could not be allocated. (If throwErrors, throws kODErrOutOfMemory instead of returning false.) You can alternatively use InitAndCopyFrom() which copies another table. ODBoolean OpenHashTable::InitAndCopyFrom(const OpenHashTable& otherTable); Initialize this table so that it uses the same parameters as otherTable; sizeHint is taken as otherTable.Count(). Then we call this->CopyTable(otherTable) to copy the contents. Returns false only if memory could not be allocated. (If throwErrors, throws kODErrOutOfMemory instead of returning false.) ODBoolean OpenHashTable::CopyTable(const OpenHashTable& otherTable); Copy all of otherTable entries into this table such that the result is the union of the previous contents of this table and otherTable. If this table was empty, the result is a simple copy. If this table contains entries that are also in otherTable, they are replaced with otherTable entries (beware losing pointers and creating garbage this way). otherTable should have the same equal and hash functions, as well as the same key and value sizes. Returns false only if memory could not be allocated. (If throwErrors, throws kODErrOutOfMemory instead of returning false.) The implementation is just an optimized way of iterating over entries in otherTable and calling this->ReplaceEntry() for each. OpenHashTable::~OpenHashTable(); Frees all the memory that the table allocated. However, if clients allocate memory and place pointers in the hash table, ~OpenHashTable has no way to know this and so does not delete such memory. Clients must deleted owned storage before deleting the hash table. Association Operations This section addresses creating, destroying, querying, reading, and writing key/value associations in the hash table. These methods were chosen to resemble the interface of AEHashTable in the Mac toolbox API. ODBoolean OpenHashTable::ReplaceEntry(void* keyPtr, void* valuePtr); Despite the fact that the name implies a pre-existing entry, this method is also the correct way to add an entry for the first time. Replace and/or add an entry that associates key with value. keyPtr must be a pointer to a key, and valuePtr must be a pointer to a value (e.g. if a key is struct Foo, then keyPtr must be type struct Foo*). key-size bytes from keyPtr will be copied into the table, and value-size bytes from valuePtr will be be copied into the table (if value-size is zero, valuePtr is ignored). (Note that if you need a copy of the old contents of the key or value from a previous association, you should copy them first using one of GetKey(), GetValue(), or GetKeyAndValue().) If the addition of this entry makes the table's entry count exceed the current capacity of the table, then the table grows by some percentage (say 50%), leaving some empty slots for future additions. False is returned only if memory could not be allocated. (If throwErrors, throws kODErrOutOfMemory instead of returning false.) void OpenHashTable::RemoveEntry(void* keyPtr); Remove the association with the specified key. keyPtr must be a pointer to a key (e.g. if a key is struct Foo, then keyPtr must be type struct Foo*). Nothing happens if there was no such association. This operation is fast and cheap because an old entry is simply "unlinked" and set aside for reuse. (Consider using ShrinkToFit() to reclaim space after numerous removals.) (Note that if you need a copy of the old contents of the key or value from a previous association, you should copy them first using one of GetKey(), GetValue(), or GetKeyAndValue().) ODBoolean OpenHashTable::GetValue(void* keyPtr, void* valuePtr) const; Find and return the value associated with key in (*valuePtr); returns false only if there was no association with the given key, or if value-size is zero. keyPtr must be a pointer to a key, and valuePtr must be a pointer to a value (e.g. if a key is struct Foo, then keyPtr must be type struct Foo*). If the association is found, and if value-size is nonzero, then value-size bytes are copied to the address indicated by valuePtr; otherwise valuePtr is ignored. ODBoolean OpenHashTable::GetKey(void* keyInPtr, void* keyOutPtr) const; Find the key in the table equal to (*keyInPtr) and return it in (*keyOutPtr); returns false only if there was no association with the given key. keyInPtr and keyOutPtr must be pointers to keys (e.g. if a key is struct Foo, then keyInPtr must be type struct Foo*). If the association is found, then key-size bytes are copied to the address indicated by keyOutPtr; it is safe to use the same pointer for keyInPtr and keyOutPtr because keyInPtr is not read again after keyOutPtr is written. ODBoolean OpenHashTable::GetKeyAndValue(void* key, void* keyPtr, void* valuePtr) const; Roughly equivalent to GetValue(key, keyPtr), GetValue(key, valuePtr), but faster. Either keyPtr or valuePtr can be zero to suppress copying that part of the association. ODBoolean OpenHashTable::Exists(void* keyPtr) const; Return whether there is an association with a key equal to (*keyPtr). keyPtr must be a pointer to a key (e.g. if a key is struct Foo, then keyPtr must be type struct Foo*). Sizing Information and Meta Behavior This section describes methods that access table size information, table metainformation, and a method to reduce the space occupied by a table. ODULong OpenHashTable::Count() const; Returns the number of associations (entries) currently in the table. ODULong OpenHashTable::Capacity() const; Returns the current number of slots in the table, where slots >= Count(). The amount this number is greater than Count() is an indication of the amount of extra space for additional entries that is available before growth is necessary. One might check this number in order to decide whether to downsize the table with ShrinkToFit(). ODULong OpenHashTable::Seed() const; Returns a counter which changes every time the table is modified. If you save this value and then call Seed() again later, you can see if the table has been modified in the interim. For example, this is how OpenHashTableIterator knows whether to throw kODErrIteratorOutOfSync. OpenHashSize OpenHashTable::KeySize() const; Returns the size of keys as specified to foo, i.e key-size. OpenHashSize OpenHashTable::ValueSize() const; Returns the size of values as specified to foo, i.e value-size. ODBoolean OpenHashTable::ThrowErrors() const; Returns whether the table throws exceptions on errors. False means table methods should return normally. void OpenHashTable::SetThrowErrors(ODBoolean b); Sets the boolean that controls whether the table throws on errors. ODBoolean OpenHashTable::ShrinkToFit(OpenHashSize extraSlots=3); Change the capacity of the table to Count() + extraSlots. This is the principal means of recovering storage after numerous removals. Presumably an application watches the relationship between Count() and Capacity(), and shrinks the table when count falls below some threshold. Returns false only if memory could not be allocated. (If throwErrors, throws kODErrOutOfMemory instead of returning false.) Testing and Debugging ODBoolean OpenHashTable::Walk(OpenHashAction action, void* ref) const; Call action once for every association in the hash table, passing through ref each time, as well as the slot in the table containing the association. If action ever returns true, the walk is terminated immediately, rather than visiting every association. Walk() only returns true if action returns true (thus, an empty table never calls action and returns false). This function is useful for printing the distribution of keys in a table to evaluate the quality of hashing. (It's not appropriate to expose slots in the table with an iterator.) You cannot add or remove associations during the walk. Comparing Hash Tables Intersection is a particularly useful way of comparing hash tables. ODBoolean OpenHashTable::Intersect(OpenHashTable* ht, OpenHashAction action, void* ref); Walk this table and compare with table ht, calling action (with ref as one argument) once for every association in this table which has a key equal to a key in ht. (Note: the value of slot passed to action is meaningless.) If action ever returns true, Intersect() returns immediately without examining any more associations. Intersect() returns true only if action returns true (thus, for example, if either table is empty then action is never called and Intersect() returns false). If the caller wishes to build a data structure (e.g. a list) that represents the intersection, this can be maintained in ref. Or, if the caller wants a boolean yes or no (is the intersection non-empty?) then action merely needs to always return true (use TrueAction()). Table ht must have the same hash and equal functions as in this table. You cannot add or remove associations in this table during the intersection, but you can in ht. static ODBoolean OpenHashTable::TrueAction(void* k, void* v, ODULong s, void* r); This function ignores its arguments and always returns true. It is meant to be passed as an action argument to Intersect(). <end OpenHashTable> OpenHashTableIterator This class iterates over associations in a hash table (OpenHashTable). OpenHashTableIterator::OpenHashTableIterator(OpenHashTable* table); OpenHashTableIterator::~OpenHashTableIterator(); The constructor and destructor do very little. The constructor only sets up enough information so it can figure out the current state when any of the following methods are called. A freshly constructed iterator behaves as if it has just finished iterating over the hash table, so First() will be necessary before Next(). void OpenHashTableIterator::First(void* keyPtr, void* valuePtr); Resets the iteration and copies the "first" <key, value> pair in the hash table. Either keyPtr or valuePtr can be set to zero to suppress copying that part of the assocation. valuePtr will also be ignored if value-size is zero. keyPtr must be a pointer to a key, and valuePtr must be a pointer to a value (e.g. if a key is struct Foo, then keyPtr must be type struct Foo*). keyPtr or valuePtr are ignored if there are no more associations (use IsNotComplete() to check). The iterator and the table are always in sync after First(). void OpenHashTableIterator::Next(void* keyPtr, void* valuePtr); Continues the iteration and copies the "next" <key, value> pair in the hash table. Either keyPtr or valuePtr can be set to zero to suppress copying that part of the assocation. valuePtr will also be ignored if value-size is zero. keyPtr must be a pointer to a key, and valuePtr must be a pointer to a value (e.g. if a key is struct Foo, then keyPtr must be type struct Foo*). keyPtr or valuePtr are ignored if there are no more associations (use IsNotComplete() to check). If the table is out of sync with the iterator (has been modified since First() other than by RemoveCurrent()), then exception kODErrIteratorOutOfSync is thrown. ODBoolean OpenHashTableIterator::Current(void* keyPtr, void* valuePtr); Copies the same <key, value> pair as the last call to either First() or Next(). Either keyPtr or valuePtr can be set to zero to suppress copying that part of the assocation. valuePtr will also be ignored if value-size is zero. keyPtr must be a pointer to a key, and valuePtr must be a pointer to a value (e.g. if a key is struct Foo, then keyPtr must be type struct Foo*). keyPtr or valuePtr are ignored if there are no more associations (use IsNotComplete() to check). If the table is out of sync with the iterator (has been modified since First() other than by RemoveCurrent()), then exception kODErrIteratorOutOfSync is thrown. void OpenHashTableIterator::RemoveCurrent(); Efficiently removes the current <key, value> pair from the table. If the table is out of sync with the iterator (has been modified since First() other than by RemoveCurrent()), then exception kODErrIteratorOutOfSync is thrown. ODBoolean OpenHashTableIterator::IsNotComplete() const; Returns true if the last call to First(), Next() or Current() accessed an association in the table. ODBoolean OpenHashTableIterator::IsOutOfSync() const; Returns whether the iterator is out of sync with the table. Users who do not want an kODErrIteratorOutOfSync exception to be thrown from Next(), Current(), or RemoveCurrent() should avoid calling those functions when IsOutOfSync() returns true. (The iterator becomes out of sync when the table is modified other than by RemoveCurrent() since the last call to First().) <end OpenHashTableIterator> OpenHashTable Implementation Illustration Here is a picture of a hash table under the following conditions: * OPENHASH_CACHE_HASH is not defined. * fSlots == 4, fCount == 3 (there is one free slot left in the table) * Keys are single bytes, fKeySize == 1 == sizeof(char). * Values are shorts (2-byte ints), fValSize == 2 == sizeof(short). * The hash of a key is just the byte value of the key. * The table contains the following associations (added in this order): <(char)1, (short) 0x10>, <(char)2, (short) 0x20>, <(char)5, (short) 0x50> * fEntries begins at address 0x100. Note that keys 1 and 5 both hash to the same bucket (1 % 4) == (5 % 4). The PHYSICAL representation of this table is as follows: fTable +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | entry = 0 | entry = 0x108 | entry = 0x104 | entry = 0 | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 0x100: 0x104: 0x108: 0x10C: fEntries +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | fLink = 0 | fLink = 0 | fLink = 0x100 | fLink = 0 | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ fKeys +---+---+---+---+ | 1 | 2 | 5 |???| +---+---+---+---+ fValues +---+---+---+---+---+---+---+---+ | 0x10 | 0x20 | 0x50 |???????| +---+---+---+---+---+---+---+---+ fFreeList == 0x10C // points to last fEntries cell The LOGICAL representation of this table is as follows: +---+ | /| 0 | / | |/ | +---+ 108: 100: | | +---+---+---+---+ +---+---+---+---+ 1 | *---->| 5 | 0x50 | *---->| 1 | 0x10 | / | | | +---+---+---+---+ +---+---+---+---+ +---+ 104: | | +---+---+---+---+ 2 | *---->| 2 | 0x20 | / | | | +---+---+---+---+ +---+ | /| 10C: 3 | / | +---+---+---+---+ |/ | free list -->|???|???????| / | +---+ +---+---+---+---+ The fEntries slots were allocated sequentially as they were pulled off the free list of entries; the slots in fKeys and fValues were allocated in parallel with the fEntries slots (key and value allocation is by entry index). <end>